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INTRODUCTION:  Despite  tremendous  advances  in  mutation  detection  with  gene  panels  and  exome 
sequencing  the  majority  of  high  risk  breast  cancer  families  do  not  have  their  causative  alleles  identified  from 
the  protein-coding  region  of  the  genome.  We  hypothesize  that  their  critical  mutations  lie  in  unknown  regulatory 
regions  of  the  genome.  Through  whole  genome  sequence  analysis  of  severely  affected  families  and  functional 
annotation  and  experimental  evidence  we  plan  to  identify  new  mutational  mechanisms  that  predispose  to 
breast  cancer.  Our  ultimate  goal  is  to  enable  information  on  newly  identified  mutations  and  mutational 
mechanisms  to  be  useful  to  clinicians  and  to  women  and  their  families. 

KEYWORDS:  Breast  cancer,  BRCA1,  BRCA2,  whole  genome  sequencing,  promoter,  enhancer,  transcription 
factor  binding  site,  gene  regulation,  mutation. 

OVERALL  PROJECT  SUMMARY:  We  describe  below  our  progress  during  Year  1  with  respect  to  each  of  the 
Tasks  outlined  in  the  approved  SOW.  All  Tasks  are  performed  at  the  same  research  location  and  the  joint 
responsibility  of  both  the  Initiating  PI  (Tom  Walsh  PhD)  and  Partnering  PI  (Mary-Claire  King,  PhD). 

TASK  1.  Perform  whole  genome  sequencing  of  germline  DNA  from  100  breast  cancer  patients  selected  from 
30  severely  affected  families. 

la.  Prepare  100  standard  paired  end  library  with  300-400bp  inserts  (months  1-3) 

lb.  Prepare  100  mate-paired  library  with  tightly  defined  6kb  inserts  (months  1-3) 

lc.  Sequence  the  paired  end  and  mate-paired  libraries  on  a  HiSeq2500  (months  2-9) 

We  have  prepared  both  library  types  and  generated  sequencing  data  on  the  100  breast  cancer  patients. 

TASK  2.  Annotating  sequencing  genome  variants  with  respect  to  population  frequency  and  overlap  with 
ENCODE  regions. 

2a.  Align  reads  to  the  reference  sequence  (months  4-10) 

2b.  Identify  SNPs,  indels,  CNVs  and  rearrangements  by  bioinformatic  tools  (months  4-10) 

2c.  Filter  variants  from  Task  2b  against  publically  available  databases  to  remove  common  events  (months4-10) 
2d.  Filter  rare  and  private  variants  from  Task  2c  within  families  to  obtain  segregating  variants  (months  4-10) 

2e.  Compare  surviving  events  from  Task  2d  to  ENCODE  regions  (months  4-10) 

2f.  Further  filter  variants  from  Task  2e  to  ENCODE  variants  mapped  only  in  breast  tissues/lines  (months4-10) 

We  have  developed  a  functional  annotation  approach  to  filter  variants  from  the  whole  genome  sequences. 
Table  1  below  summarizes  the  variant  output  from  one  of  the  severely  affected  breast  cancer  families. 

Table  1.  Distribution  of  different  types  of  mutations  at  different  filtering  levels  in  the  whole  genome 
sequencing  data  of  two  patients  from  a  severely  affected  breast  cancer  Family  1041. 
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Untranslated 
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13,933 
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In-frame 
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We  have  filtered  variants  from  the  breast  cancer  families  to  those  in  the  lOOOGenomes  project1  and  more 
recently  to  those  described  in  the  Genomes  of  the  Netherlands2.  We  further  narrowed  down  the  variant  list  by 
filtering  out  non-shared  segments  of  the  genome  (termed  IBDO)  in  each  of  the  30  families.  Figure  1  shows  the 
non-IBDO  regions  in  Family  1041. 


Figure  1.  Non-IBDO  regions  for  Family  1041.  The  largest  region  overlaps  BRCA1  on  chromosome  17. 
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After  the  non-IBDO  sharing  constraint  has  been  applied  we  categorized  the  remaining  variants  with  respect  to 
the  following  features  developed  through  the  various  projects  of  ENCODE3. 

1)  Genomic  location:  gene  upstream,  5’UTR,  1st  exon  and  intron 

2)  Histone  marks  such  as  H3K9Ac,  H3K27Ac) 

3)  DNAsel  hypersensitivity  data 

4)  ChIP-seq  signals 

5)  Conservation  of  the  variant  with  its  flanking  bases 

6)  Effect  of  variant  on  Transcription  Factor  binding  site  motif  score  using  position  weight  matrices. 

We  show  in  Figure  2  an  example  of  a  variant  from  breast  cancer  Family  1041  that  was  shared  by  all  women 
with  breast  cancer  in  the  family  and  scored  highly  in  our  filtering  scheme. 
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Figure  2.  A  C>A  variant  located  at  chrl  7:41 ,227,852  identified  in  Family  1041.  The  variant  is  located 
within  a  potential  regulatory  region  upstream  of  BRCA1  and  alters  the  fifth  base  of  the  conserved  HLF 
binding  site.  The  region  is  conserved  in  5  mammalian  species. 
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TASK  3.  Characterize  potential  regulatory  variants. 

3a.  Generate  enhancer  constructs  with  wildtype  and  variant  regulatory  regions  (months  8-16) 

3b.  Transfect  constructs  into  cell  lines,  monitor  lucif erase  activity  (months  8-16) 

3c.  Measure  gene  expression  in  patients’  lymphoblasts  (months  8-16) 

We  have  made  the  mutant  and  wildtype  constructs  of  1 1  different  potential  non  coding  regulatory  mutations 
including  chrl  7:  41,227,852  C>A  from  Family  1041  and  are  currently  assessing  luciferase  activities.  Direct 
mRNA  measurements  are  also  ongoing  for  these  variants  at  their  associated  genes  in  patient’s  lymphoblasts. 


TASK  4.  Resequence  mutant  regulatory  regions  in  large  series  of  patients  to  identify  additional  mutations 

4a.  Design  molecular  inversion  probes  (MIPs)  for  promising  regulatory  regions  (months  12-24) 

4b.  Perform  MIP  amplification,  hybridization  and  sequencing  (months  12-24) 

4c.  Annotate  varints  within  regulatory  regions  with  respect  to  frequency  (months  12-24) 

4d.  Statistical  analysis  of  variants  (months  12-24) 

We  have  begun  generating  candidate  lists  for  resequencing  and  will  commence  Task  4  shortly. 
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KEY  RESEARCH  ACCOMPLISHMENTS: 


•  Developing  an  approach  that  functionally  annotates  whole  genome  sequencing  data  with  respect  to 
shared,  rare  and  potential  regulatory  variants  is  a  major  contribution  to  achieving  our  goals. 


CONCLUSION:  In  Year  1  of  this  project  we  have  generated  all  the  whole  genome  sequencing  data  necessary 
to  achieve  our  goal  of  identifying  new  mutational  mechanisms  for  breast  cancer  predisposition.  We  have 
developed  a  functional  annotation  approach  that  can  pinpoint  potential  regulatory  mutations  and  we  are 
beginning  to  develop  a  list  of  variants  that  will  be  furthered  evaluated  experimentally. 

Our  objective  of  identifying  new  mutational  mechanisms  of  breast  cancer  predisposition  is  on  track  and  can  be 
achieved  with  further  analysis  and  the  experiments  outlined  in  our  SOW  Tasks. 


PUBLICATIONS,  ABSTRACTS,  AND  PRESENTATIONS:  At  the  end  of  Year  1  we  have  not  submitted  any 
manuscripts  but  anticipate  doing  so  in  Year  2  and  presenting  our  findings  at  scientific  meetings  in  2015. 


INVENTIONS,  PATENTS  AND  LICENSES:  Nothing  to  report 


REPORTABLE  OUTCOMES:  Nothing  to  report,  at  this  stage 


OTHER  ACHIEVEMENTS:  Nothing  to  report 
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